Evaluating bias due to data linkage error in electronic healthcare records

نویسندگان

  • Katie Harron
  • Angie Wade
  • Ruth Gilbert
  • Berit Muller-Pebody
  • Harvey Goldstein
چکیده

BACKGROUND Linkage of electronic healthcare records is becoming increasingly important for research purposes. However, linkage error due to mis-recorded or missing identifiers can lead to biased results. We evaluated the impact of linkage error on estimated infection rates using two different methods for classifying links: highest-weight (HW) classification using probabilistic match weights and prior-informed imputation (PII) using match probabilities. METHODS A gold-standard dataset was created through deterministic linkage of unique identifiers in admission data from two hospitals and infection data recorded at the hospital laboratories (original data). Unique identifiers were then removed and data were re-linked by date of birth, sex and Soundex using two classification methods: i) HW classification - accepting the candidate record with the highest weight exceeding a threshold and ii) PII-imputing values from a match probability distribution. To evaluate methods for linking data with different error rates, non-random error and different match rates, we generated simulation data. Each set of simulated files was linked using both classification methods. Infection rates in the linked data were compared with those in the gold-standard data. RESULTS In the original gold-standard data, 1496/20924 admissions linked to an infection. In the linked original data, PII provided least biased results: 1481 and 1457 infections (upper/lower thresholds) compared with 1316 and 1287 (HW upper/lower thresholds). In the simulated data, substantial bias (up to 112%) was introduced when linkage error varied by hospital. Bias was also greater when the match rate was low or the identifier error rate was high and in these cases, PII performed better than HW classification at reducing bias due to false-matches. CONCLUSIONS This study highlights the importance of evaluating the potential impact of linkage error on results. PII can help incorporate linkage uncertainty into analysis and reduce bias due to linkage error, without requiring identifiers.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Estimating Screening Test Utilization Using Electronic Health Records Data

BACKGROUND Electronic health records (EHRs) are increasingly used by medical providers and offer a wide-reaching source of information on utilization of preventive services. Numerous measures used for quality assessment and public reporting are estimated based on EHR data. However, sources of error and misclassification can lead to over- or under-estimation of true utilization rates. EHR-derive...

متن کامل

A guide to evaluating linkage quality for the analysis of linked data

Linked datasets are an important resource for epidemiological and clinical studies, but linkage error can lead to biased results. For data security reasons, linkage of personal identifiers is often performed by a third party, making it difficult for researchers to assess the quality of the linked dataset in the context of specific research questions. This is compounded by a lack of guidance on ...

متن کامل

Information Age, Electronic Health Record and Australia Healthcare

The emergence of the Internet has impacted the health information and the healthcare industry. The information revolution has reduced the distance between the healthcare providers and consumers. It permits easy dissemination of information and fast accessibility of data. Because of easy accessibility, privacy and confidentiality has become an important issue to be considered in the implementati...

متن کامل

Utilising identifier error variation in linkage of large administrative data sources

BACKGROUND Linkage of administrative data sources often relies on probabilistic methods using a set of common identifiers (e.g. sex, date of birth, postcode). Variation in data quality on an individual or organisational level (e.g. by hospital) can result in clustering of identifier errors, violating the assumption of independence between identifiers required for traditional probabilistic match...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 14  شماره 

صفحات  -

تاریخ انتشار 2014